Rethinking Data Augmentation for Robust Visual Question Answering
نویسندگان
چکیده
AbstractData Augmentation (DA) — generating extra training samples beyond the original set has been widely-used in today’s unbiased VQA models to mitigate language biases. Current mainstream DA strategies are synthetic-based methods, which synthesize new by either editing some visual regions/words, or re-generating them from scratch. However, these synthetic always unnatural and error-prone. To avoid this issue, a recent work composes augmented randomly pairing pristine images other human-written questions. Unfortunately, guarantee have reasonable ground-truth answers, they manually design of heuristic rules for several question types, extremely limits its generalization abilities. end, we propose Knowledge Distillation based Data VQA, dubbed KDDAug. Specifically, first relax requirements image-question pairs, can be easily applied any type. Then, knowledge distillation (KD) answer assignment generate pseudo answers all composed robust both in-domain out-of-distribution settings. Since KDDAug is model-agnostic strategy, it seamlessly incorporated into architecture. Extensive ablation studies on multiple backbones benchmarks demonstrated effectiveness abilities KDDAug.KeywordsVQAData augmentationKnowledge
منابع مشابه
Data Augmentation for Visual Question Answering
Data augmentation is widely used to train deep neural networks for image classification tasks. Simply flipping images can help learning by increasing the number of training images by a factor of two. However, data augmentation in natural language processing is much less studied. Here, we describe two methods for data augmentation for Visual Question Answering (VQA). The first uses existing sema...
متن کاملRobust Question Answering
A Question Answering (QA) system should provide a short and precise answer to a question in natural language, by searching a large knowledge base consisting of natural language text. The sources of the knowledge base are widely available, for written natural language text is a preferential form of human communication. The information ranges from the more traditional edited texts, for example en...
متن کاملAn Exploration of Data Augmentation and RNN Architectures for Question Ranking in Community Question Answering
The automation of tasks in community question answering (cQA) is dominated by machine learning approaches, whose performance is often limited by the number of training examples. Starting from a neural sequence learning approach with attention, we explore the impact of two data augmentation techniques on question ranking performance: a method that swaps reference questions with their paraphrases...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملRevisiting Visual Question Answering Baselines
Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support “reasoning”. For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-20059-5_6